pFilter: Global Information Filtering and Dissemination
نویسندگان
چکیده
Due to the overwhelming amount of information on the Internet, it is becoming increasingly difficult for people to find relevant information in a timely fashion. Information filtering and dissemination systems allow user to register persistent queries called user profiles. They detect new contents, match them against the profiles, and continuously notify users when relevant information becomes available. Existing systems, however, either are not scalable; or do not support matching of unstructured documents. Unstructured documents such as text, HTML or multimedia files, account for a significant percentage of contents on the Internet. To address the limits of the existing systems, we describe pFilter, a global-scale decentralized information filtering and dissemination system for unstructured documents. To handle potentially billions of documents for millions of subscribers, pFilter connects potentially millions of computers in national (and international) computing Grids or ordinary desktops into a structured peer-to-peer overlay network. Nodes in the overlay collectively publish/collect documents, build index, register profiles, and filter and disseminate information. To enable efficient and accurate match between profiles and documents without flooding either documents or profiles, profiles in the overlay are organized around their vector representations (based on modern information retrieval algorithms) such that the searching space of a new document is organized around related profiles. In pFilter, we introduce a new application-level multicast algorithm that allows documents to be efficiently disseminated to a large number of interested parties.
منابع مشابه
pFilter: Global Information Filtering and Dissemination Using Structured Overlay Networks
The exponential data growth rate of the Internet makes it increasingly difficult for people to find desired information in a timely fashion. Information filtering and dissemination systems allow users to register persistent queries called user profiles, and notify users when relevant files become available. Existing such systems, however, either are not scalable, or do not support matching of u...
متن کاملA New Approach to Filtering of XML Streaming Data
Information processing and retrieval in many applications needs filtering of the XML streams. A streamfilter system examines queries on a continuous stream of XML documents and delivers matched content to the user. This paper proposes a new algorithm named PFilter for stream filtering systems. The PFilter processes a large amount of XPath query expressions to provide the desired XML nodes. PFil...
متن کاملEnabling Dissemination of Meta Information in the Usenet Framework
The paper discusses a transparent and flexible way to disseminate meta information within the global conferenci Examples of such information are ratings for Usenet articles or information about the behavior of other users. In describes how the Usenet "overview'' mechanism was modified to disseminate meta information to off-the-shelf U with the modified overview mechanism are discussed by exampl...
متن کاملThe PSI3 Agent Recommender System
This paper presents a multi-agent system (MAS) that implements a recommender system, essentially using collaborative filtering techniques. The design of the MAS is flexible to support the implementation of different filtering strategies and to control the global behavior of the system and its users. It has been applied in the PSI3 project to implement a personalized information dissemination se...
متن کاملEfficient Filtering of XML Documents for Selective Dissemination of Information
Information Dissemination applications are gaining increasing popularity due to dramatic improvements in communications bandwidth and ubiquity. The sheer volume of data available necessitates the use of selective approaches to dissemination in order to avoid overwhelming users with unnecessaryinformation. Existing mechanisms for selective dissemination typically rely on simple keyword matching ...
متن کامل